COMPRESSING OVERSIZED INFORMATION IN MARKOV 

CHAINS 

GIACOMO ALETTI 



Abstract. Given a strongly stationary Markov chain and a finite set of stop- 
ping rules, we prove the existence of a polynomial algorithm which projects the 
Markov chain onto a minimal Markov chain without redundant information. 
Markov complexity is hence defined and tested on some classical problems. 



1. Introduction 

Let X n be a stationary Markov chain on a finite set E with transition matrix 
P. The Markov process stops when one of the given stopping rules occurs. The 
problem of finding the stopping law may be solved by embedding the Markov chain 
into another Markov chain on a larger state's set (the tree made by both the states 
and the stopping rules, see [3])- The desired law is obtained then from the transition 
matrix of the new Markov chain. 

Unfortunately, this new Markov chain may be so big that numerical computa- 
tions can be not practicable. A new method permitting to ensure the existence of 
a projection of the Markov chain into a "minimal" Markov chain which preserves 
probabilities was presented in [2j. 

As in we recall now how this problem occurs in many situations. 

(1) In finance some filter rules for trading is a special case of the Markov chain 
stopping rule suggested by the authors in [3]. 

(2) "When enough is enough" ! For example, an insured has an accident only 
occasionally in a while. How many accidents in a specified number of years 
should be used as a stopping time for the insured (in other words, when it 
should be discontinued the insurance contract). 

(3) State dependent markov chains. Namely, the transition probabilities are 
given in terms of the history. For simplicity consider the decision to stop if 
we get 2 identical throws (11, 22, 33, . . . nn) (for example, when n — 2, an 
insured has two kinds of accidents in row-one each year and is discontinued 
or an insured has no accidents two years in a row and therefore he is "pro- 
moted" to a better class of insured) . If probability of a switch from hm to 
mk is denoted by Phm,mk then the Markov transition matrix has the form: 
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which can be analyzed for the stopping time by the usual methods. Obvi- 
ously, in many situations (e.g., if phm,mk = Pm,k Vft ^ m), this matrix has 
a special structure and can be reduced. 
(4) Small-world Networks. Given one of the networks as in Figure ^ is it 
possible to reduce it and to preserve the law of reaching a given absorbing 
state? 




Figure 1. Networks that may be shrinked. 

Formally, the problem is given by a triple [E, T, P), where: 

• E is a set (set of states); 

• T is a nonempty subset of E (target set); 

• P : E x E — » M + with the following properties: 

- Vee£,£P(e,-) = l; 

- P _1 (0, oo)n(Tx£)C(Tx T). 

P may be identified to the probability transition matrix = P((ei,ej)). 

As shown in 3 , compressing non influent information is equivalent to find a 
triple (F, t, P*) and a map n : E — > F s.t. 

• 7r is a surjective set function from E to F; 

• t = 7r(T),T = 7r- 1 (t); 
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the following diagram commutes: 
E x y(E) 



(2) 




— ► R+ U {0} 



where Ids '■ E — > E is the identity map on E, 9p(E) is the power set of E 
andP(e,A) =^ eiej4 P((e,e 4 )). 

When the cardinality of F is strictly less than the cardinality of E, we have reduced 
some information: the subsets ir^ 1 (/), f <E F oi E act in the same way for the target 
problem. 

The proof of the optimal solution's existence was therefore based on the fact 
that the set of compatible projections tt has a minimal majorant property. More 
precisely, if tt is a projection from E to another set F, let be the equivalence 
relationship on E defined by e\R 7T e2 •<=>■ 7r(ei) = 7r(e2). If we define E := 
{R^: tt satisfies 0}-, in [3j it was proved that 



f]{R: R-, C fl,Vi? ff e i?} e 25. 



Unfortunately, finding a nontrivial R^ S -E is not a local search. In fact, we may 
have P(ei 7 es) ^ f(e2,es) but P(ei, {e$, e^) = P(e2, {e3, 64}), which means that 
ei i?7r e2 may be found if we know that e% R^ e<±. Moreover, it is not difficult to 
build examples where the only nontrivial element of E corresponds to the optimal 
nontrivial projection. Therefore, searching for a compressing map tt appears as a 
non-polynomial search, in the sense that we have to look at the whole set E of 
equivalent relations on E. In fact, finding a reducing map means to find R S E s.t. 



Vei £T e.; R e,- 



• y{e l ,e j ,e k } C B: e^-, E e ,fl efc p (^ e i) = T, e ,Re k P ( e i> e 0- 
The problem here is to find a polynomial algorithm for reaching the optimal pro- 
jection of the given Markov chain {E, P) which preserves probabilities of reach- 
ing the target set T. Moreover, we extend this method to multi-target problems 
T={T 1( ...,T fe }. 



2. The Target Algorithm 

As in 0, we act on the set of equivalence relations on a set, but we will focus 
our attention on F instead of on E. For any (finite) set A, we denote by A be 
the set of all equivalence relations on A. Moreover, given an equivalence relation 
ReA,we denote by A/R the quotient set of A by R. We introduce a partial order 
1= on A. Let R, S G A, We say that R t= S if a\ R a-i implies a\ S a-i (if you think A 
as the set of all men and R is "belonging to the same state" while S is "belonging 
to the same continent", then R\= S). The relation N is just set-theoretic inclusion 
between equivalence relations, since any relation is a subset of Ax A. We denote by 
\A\ the cardinality of a set A. We state the following trivial lemma without proof. 
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Lemma 1. Let A be a set. | • | is monotone with respect to 1= in A, i.e. 
(3a) Vfi,5ei R\= S => \ A/R\ > \ A/S\ . 

Moreover, if \A\ < oo, | • | is strictly monotone: 
(3b) \A/R\ = \A/S\,R\= S=> R = S. 



Let (F, T, P) be a triple, as above and let ir : E — > F be the optimal projec- 
tion, (for existence and uniqueness, see [3]). The map 7r is characterized by the 
equivalence relationship F^ on E: e\ F^ e2 ^=> 7r(ei) = 7r(e2). 

Let -Ft be the set of all equivalence relations on F such that the target state 
t e F is left "alone"^ i.e. R e F t if tRf / = t. 

Note that F t F; more precisely, since F — > F, we have: 

Ft <i <P(F x F) <£(F x F) 

It is obvious that (tt, ir)^ 1 o j : F t — > *P(F x F) defines an equivalence relationship 
on F. With this inclusion in mind, we can state that F t C F: 

(4) Ft^{i?GF:F, Ni?}, 

and hence we refer to F t both as a class of equivalence relations on F and on F. 
The uniqueness of the optimal solution in [3J states that J3J is well-posed. 
We call Ff the identity relationship on F: 

/i h /i = /2 

i.e. If is just Fr on F t , and let be maximal relationship on F t : 
e 1 M E e 2 {ei,e 2 } C For { ei ,e 2 } C (F\F). 

Clearly, Me S F t and Jf N R \= Me, V-R G F t (i.e., Ff and are the minimal 
and maximal relationship on F t ). Note that we can compute Me without knowing 
F. 

We build now a monotone operator V on F (the algorithm's idea will be to reach 
If -unknown- starting from Me -known-). 
Let V : E -> F so defined: 

for any i? S F, let ri,...,rjv be the classes of equivalence of F induced by i?. 
Define 

eiV n e 2 ^ P(ei,r<) =P(e2,r<) 
P(i?)= p| v n nR. 

i=l,...,N 

Now, we focus our attention on the action of V on F t . First, we prove that V : 
F t — > F t and then we will show that the unique fixed point of "P : F t — > F t is /p. 

Lemma 2. "P|_ : F t F t is a N -monotone operator on F t . 

Proof. Let R E F t ^ E. Since every is a subset of F, the existence of P* in 
ensures that P n S F t . Therefore, T-'(-R) € F t . Since 7 , (i?) C i?, it is a monotone 
operator. □ 

Theorem 3. Ip is the unique fixed point of "P : F t Ft- 
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Proof. Fir is trivially a fixed point for V : E — > E by J5J and then I F is a fixed 
point for V : Ft — > Ft. 

Now, let F G F t s.t. F = F(F). Define the canonical map ?r fl : F -> F/F. We 
have 

• {wrott)(E)=F/R; 

• (7TR O ?r)(T) = 7T fl (7T(T)) = 7TR(t) = i, 

• F C F ri Vi, and hence the following diagram commutes: 




U{0} 



F/F x F/F 

Since tt is the optimal projection such that J5J holds, then F/F = F, i.e. R = 
I F . □ 

Corollary 4. Let (F,T, F) fee given and let N be the cardinality of F, i.e. N = 
\E/F„\. Then T >N - 2 (M E ) = F^, where V n := {V o F n_1 ) and V° is the identity 
operator (i.e. F°(F) = R, VFJ. 

Proof. First, note that V n (M E ) G Ft Vn (by LemmaEJ. Therefore, we may consider 
V n :F t -> Ft- We have I F N V n+1 (M E ) N V n (M E ) \= Me Vn. 
Let C n = |F/(F™(Me))|. We now prove by induction on n that 

(5) V n (M E ) +If =*■ C n >n+l. 

For ?i = 0, Co = 2 (otherwise F = T and the problem is trivial). For the induc- 
tion step, if V n (M E ) 7^ If, then C n > n + 1. F is a monotone operator, then 
V n+1 {M E ) N V n {M E ) and hence C„+i > C„ by J^a}. Now, if C„+i = C„, then 
F" +1 (M B ) = V n+1 (M E ) by J3b| which means that V n (M E ) = I F by Theorem 
Therefore, JHJ) holds. 

If P N - 3 (M E ) = If, then "P N - 1 (M E ) = If by TheoremEl As a consequence of 
©, if V N - 3 {M E ) ^ I F , then CW_ 2 > JV = |F|. Therefore, since V N - 1 {M E ) G F, 
we have T N - 2 (M E ) = I F , i.e. V N - 2 {M E ) = F„. □ 

Remark 5. ./Vote t/ioi i/ie operator V may be computed in a \E\-polynomial time. 
Corollary^ ensures that 

V oV o ■ ■ ■ oV 

" v ' 

at most \EfF„\ - 2 times (<\E\) 

will reach F, given any triple (F,F, F). A Matlab version of such an algorithm 
for multitarget T may be downloaded at http://www.mat.unimi.it/~aletti 
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3. Extension to multiple targets and examples 

The previous results and those in [2] may be extended to multiple targets prob- 
lems. More precisely, let X be a stationary Markov chain on a at finite set E and 
let T = {Ti, . . . , Tk] be the absorbing disjoint classes of targets. Our interstest is 
engaged by the computation of the probability of reaching Tj by time r, given the 
initial distribution fx on E. If (fi, T, Prob) is the underlying probability space, we 
are accordingly interested in 

(6) [Prob(U T m=0 {u; £ Q: X m (u) £ H})]^ fe 

under the assumption that Prob({Xo = e}) = /x(e) . 

The problem is the following: is there a "minimum" set F such that the problem 
may be projected to a problem on a Markov chain on F, for any initial distribution 
H on El 

The answer is trivial, since each target class Ti defines its equivalence relationship 
Ip i . It is not difficult to show that the required set F is defined by 

F = E/I F , where I F = f] I Fl . 

i=l,...,k 

Definition 6. We call Markov complexity of the problem (E, T, P) the cardinality 
of the optimal set F . 

Remark 7. The condition P _1 (0,oo) n (Tj x E) C (Ti x Tj) ensures that each Ti 
is an absorbing state. In fact this assumption allows to compute © by P T . If we 
are interested in the probability of being in a target set Ti at time r, this condition 
may be dropped, leaving the compressing problem unchanged. 

We start here by showing some "irreducible" classical problems. 

Example 8 (Negative Binomial Distribution). Repeate independently a game with 
probability p of winning until you win n games. 

Let S n = Ylf—i Yj, where {Yi,i £ N} is a sequence of i.i.d. bernoulli random 
variable with Prob({Yi = 1}) = I — Prob({Yi = 0}) = p. Our interstest is engaged 
by the computation of the probability of reaching n starting from 0. Let E = 
{0,1, ... ,n} be the set of levels we have reached. We have 
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Since the length of the minimum path for reaching the target state n from different 
states is different, the problem is irreducible by 3, Proposition 31]. Its Markov 
complexity is n + 1. 

Example 9 (Consecutive winning). Repeate independently a game with probability 
p of winning until you win n consecutive games. 
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The problem is similar to the previous one, where 
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The problem is again irreducible by 3, Proposition 31]. Its Markov complexity is 
n+ 1. 

Example 10 (Gambler's ruin). Let two players each have a finite number of pen- 
nies (say, ni for player one and ni for player two). Now, flip one of the pennies 
(from either player), with the first player having p probability of winning, and trans- 
fer a penny from the loser to the winner. Now repeat the process until one player 
has all the pennies. 

Let S n — Yl7=i (2^ — l), where {Yi,i e N} is a sequence ofi.i.d. bernoulli random 
variable with Prob({Yi = 1}) = 1 — Prob({Yi = 0}) = p. Our interstest is engaged 
by the computation of the probability of reaching T\ = ni or Ti — — m (multiple 
target) starting from 0. Let E = {— ni, . . . , —1, 0, 1, . . . ,n%} be the set of levels we 
have reached. We have 
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This problem is clearly irreducible, since it is for T% (for example). The problem 
may be reduced if and only if we are interesting in the time of stopping (without 
knowing who wins, i.e. T = T\ U T2) and p = 1/2. In this case, the relevant 
information is the distance from the nearest border and hence the problem may be 
half-reduced. 



The following classical problem may be reduced. 

Example 11 (Random walk on a cube). A particle performs a symmetric random 
walk on the vertices of a unit cube, i.e., the eight possible positions of the particle 
are (0,0,0), (1,0,0), (0,1,0), (0,0,1), (1, 1, 0),. . . ,(1, 1, 1), and from its current 
position, the particle has a probability of 1/3 of moving to each of the 3 neighboring 
vertices. This process ends when the particle reaches (0,0,0) or (1, 1, 1). 
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Let Ti = (0,0,0), T2 = (1, 1, 1). The following transiction matrix 



(0,0,0) (1,0,0) (0,1,0) (0,0,1) (1,1,0) (1,0,1) (0,1,1) (1,1,1) 
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can be easily reduced on 
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where ti — Ti and fi = {e = (ei,e2,es) J2 e i = 0; i- e - its Markov complexity 
is 4. If we are only interesting in the time of stopping (i.e. T = T\ U T2), the 
previous problem may be reduced to a geometrical one (Markov complexity equal 
to 2). Clearly, this results hold also for random walk on a d-dimensional cube. 

Example 12 (Coupon Collector's Problem). Let n objects {ei, . . . ,e„} be picked 
repeatedly with probability pi that object is picked on a given try, with X^Pi = 
Find the earliest time at which all n objects have been picked at least once. 

Let A be the set of permutations of the n objects. For a fixed permutation 
A = (e Xl , e\ 2 , . . . , e\ n ) £ A we denote by E x = {ex 1 , e\ 2 , . . . , e\ i } the set of the 
first i-objects in A (without order!). 

Now, let A\ be the set of all the paths that have picked all the n objects with 
the order given by A. In Pattern-Matching Algorithms framework (see [3] Section 3 
and Remark 18]), the stopping X-rule we consider here is denoted by 

T A = e Xl {E Xl }*ex 2 {E x2 }* ■ • ■ ex^E^ 1 }* e Xn , 

and it becomes a target state of an enbedded Markov problem on a graph (see 
Section 3]). The stopping class for the Coupon Collector's Problem is accordingly 
T = U AeA T A . 

It is not difficult to show that the general Coupon Collector's Problem may 
be embedded into a Markow network of 2™ — 1-nodes (its general Markov hard 
complexity), where E = {T, {E x : A 6 A, 1 < i < n}, the transition matrix is given 

by 

, 3 f Z k ex*Pk, if^=^ 3 ; 

P(E X ,E<)=1 pfcj if j = i + l andi?C 3 e fe }; 

t 0, otherwise; 

A, C S A and Prob^X 1 — ek}) — Pk- Note that this matrix is not in general 
reducible. 

If some pi are equal, i.e., when some states act with the same law with respect 
to the problem, the set E can be projected into a minor one. The easiest case 
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(namely, pi = 1/n Vi) is projected into a n-state problem: 



fl h H ■ ■ ■ fn-l 



T 



fx 1/n 1 -1/n ... 
h 2/n 1-2/n '■■ 
fa 3/n '•■ 















> =■ P 
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1 



with ProbdX 1 = fx}) = 1. Here, /; = {E x \\ E A}. The problem is again 
irreducible by 3, Proposition 31] and its Markov complexity is n. In general, 
when we have m different values of {p.;, i — 1, . . . , n) (namely, qx, . . . , q m ), if n m — 
\k: pk — q m \, then the Markov complexity can be easily proven to be IlfeLi^^ + 
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